--- permalink: /identifying-repositioning-candidates-for-antineoplastics.html --- Narrative
In [1]:
from ipypublish.scripts.ipynb_latex_setup import *

Identifying Repositioning Candidates for Antineoplastics

Abstract

The Cancer Genome Atlas (TCGA) has collected mutation and expression data for over 20,000 tumor samples, but most subtypes of cancer have few normal tissue samples to compare against. We uniformly computed expression data for both TCGA and The Genotype Tissue Expression Constortium (GTEx), which collected expression data from thousands of normal tissue samples, to create a large repository of cancer and normal expression data free of computational batch effects. Combined expression data was validated by identifying known cancer phenotypes for several antineoplastic drug targets and finding similar expression patterns in both TCGA and GTEx. Repositioning candidates were found by identifying cancer subtypes that share phenotypes with the positively validated targets.

Contents

In [2]:
# Set autoreload module for dev
%load_ext autoreload
%autoreload 2
%aimport rnaseq_lib
In [3]:
# Imports
import pandas as pd
import rnaseq_lib as r
import holoviews as hv
hv.extension('bokeh', logo=False)
In [4]:
# Inputs
## Synapse ID: syn11515015
df_path = '/mnt/rnaseq-cancer/Objects/tcga-gtex-metadata-expression.tsv'
df = pd.read_csv(df_path, sep='\t', index_col=0, dtype=r.tissues.dtype)
In [5]:
# Holoviews object wrapper for dataframe
h = r.plot.Holoview(df)

Introduction

Previous large scale comparisons of gene expression in TCGA have been done with fewer than 5,000 samples — a majority (~90%) of the samples derived from primary tumor tissue and the rest representing 'normal' tissue, some of which is taken from the same patient carrying the tumor [Guo et al, 2013., Unresolved citation: peng_large-scale_2015-1.]. In one study, the dataset consisted of 4,043 tumor samples and 548 normal tissue samples across 21 TCGA cancer types — an average of only 26 normal samples for each cancer type compared to ~200 tumor samples [Peng et al, 2015.]. While gene expression in normal tissue is more homogeneous than tumor samples, batch effects and contamination are a common problem with RNA-seq, which complicates an already noisy data source. Additionally, the small sample sizes cannot accurately reflect the general population, and therapeutics guided by results obtained from these analyses may have a higher likelihood of failing in clinical trials. Given the lack of normals in TCGA, the second largest dataset processed in the large-scale compute are non-cancerous samples collected from GTEx, which provides valuable insights into the mechanisms of gene regulation by studying human gene expression and regulation in tissues from healthy individuals [Consortium et al, 2015.].

There are a myriad of antineoplastic drugs designed to fight different types of cancer, but most only target a small population within a single subtype of cancer, which gives most patient few options. By validating known expression biomarkers for antineoplastics, these same expression motifs can be used to identify candidate subtypes of cancer that may respond to treatment.

RNA-seq Datasets

In [13]:
Image('figures/Datasets.png', width=400)
Out[13]:

The above figure, which depicts three different RNA-seq datasets containing ~20,000 samples, totals more than 110 terabytes of patient data — more data than can fit on most machines and far too much data to process efficiently on most hardware available to academics. Two of the three are cancer datasets, including The Cancer Genome Atlas (TCGA) which includes over 11,000 patients across 33 tumor types and represents the largest tumor collection of tumor data [{The Cancer Genome Atlas Research Network} et al, 2013.]. GTEx contains over 8,000 samples spanning almost every tissue in the human body, with the goal providing a homogenous set of expression data representing healthy tissue.

Large-scale RNA-seq Compute

In order to process this massive combined dataset in a timely and efficient manner, our lab developed Toil, a distributed workflow platform capable of massive scale [Vivian et al, 2017.]. I wrote a Toil-based RNA-seq workflow which provides results that are concordant to TCGA's RNA-seq workflow, but is an order of magnitude faster and provides quantification outputs from multiple tools [Vivian et al, .].

In [14]:
Image('figures/toil-rnaseq.png', width=600)
Out[14]:

The workflow was run on all 20,000 samples with a throughput of 99.6% on an Amazon Web Services cluster that peaked at 32,000 cores and 60TB of memory and finished in just under 4 days with room for improvement.

In [16]:
Image('figures/shared_cores.png', width=600)
Out[16]:

GTEx as a Prior for TCGA Normals

While GTEx contains thousands of normal tissue samples, they can't be compared directly to TCGA due to differences in sequencing depth and laboratory batch effects. Unfortunately, there don't exist standard RNA-seq benchmark samples that every consortium uses to calibrate with before processing, which would likely introduce fewer batch effects that are easier to correct. Current available methods typically attempt naive distribution fitting that tend to work less effectively as the amount of samples and classes increases [Johnson et al, 2007., Shaham et al, 2017.]. Instead, we can evaluate GTEx as a prior by normalizing for sequencing depth and dispersion, then comparing differential expression results for protein-coding genes between TCGA normals and GTEx normals to see how good a prior GTEx is.

In [17]:
h.sample_counts()
Out[17]:
In [18]:
tissues = ['Breast', 'Colon', 'Kidney', 'Liver', 'Lung', 'Prostate', 'Stomach', 'Thyroid', 'Uterus']
de_gtex = h.differential_expression_tissue_concordance(tissue_subset=tissues, tcga=False).relabel('GTEx')
de_tcga = h.differential_expression_tissue_concordance(tissue_subset=tissues, gtex=False).relabel('TCGA')
In [71]:
%%opts HeatMap [width=450 height=425]
(de_gtex + de_tcga).relabel('Differential Expression Gene Concordance (PearsonR)')
Out[71]:

For almost every tissue with a sufficient number of samples, the GTEx tissue counterpart is the closest approximate to the TCGA normal. Many tissues, like Bladder, Esophagus, Pancreas, and Skin have so few TCGA normals that the GTEx counterpart not being highly concordant isn't suprising. This is corroborated by GTEx being the closest approximate for tissues with larger TCGA normal sample sizes like Breast and Kidney.

Sequencing Data to Repositioning Candidates

To start identifying repositioning candidates, comprehensive information on current FDA-approved antineoplastics was collated and biomarkers were identified for each drug. Drugs whose primary biomarker was expression-based were selected for further investigation.

In [21]:
Image('figures/Expression_Discovery_Methods.png', width=600)
Out[21]:
In [23]:
Image('figures/drugs.png', width=800)
Out[23]:

Carbonic Anhydrase 9

Tumor hypoxia is associated clinically with therapeutic resistance and poor patient outcomes. One feature of tumor hypoxia is activated expression of carbonic anhydrase IX (CA9), a regulator of pH and tumor growth. Disruption of the downstream bicarbonate products can acidify tumor cells and suppress tumor growth [McIntyre et al, 2016.]. Hypoxia also promotes tumour heterogeneity through the epigenetic regulation of CA9 [Ledaki et al, 2015.]. CA9 is also a transmembrane protein and is stained for for use as an endogenous marker for investigating hypoxia [Newbold et al, 2009.].

CA9 is part of a family of carbonic anhydrases (zinc metalloenzymes) that catalyze reversible hydration of carbion dioxide to form carbonic acid. Girentuximab (trade name Rencarex) is a chimeric IgG1 monoclonal antibody to carbonic anhydrase IX which was granted fast track status and orphan drug designation by the FDA for renal cancer [, .]. In January 2017, Telix Pharmaceuticals Limited, an Australian biotechnology company, announced that it had in-licensed Girentuximab for use as a radioimmunoconjugate, iodine (124I) girentuximab, called Redectane [Unresolved citation: wilexwilex.].

CA9 is reported as a ubiquitous marker in renal cell carcinoma (RCC) [Chen et al, 2005., Turner et al, 2002., Kim et al, 2005., Wykoff et al, 2000.], and should be simple to validate by examing the expression distributions of CA9.

In [33]:
h.gene_kde(gene='CA9', tissue_subset=['Kidney'])
Out[33]:

CA9 is significantly upregulated with GTEx and TCGA normal samples for kidney sharing similar distributions of low expression. A comparison of Bland-Altman plots [Altman et al, 1983.] for kidney also show almost identical upregulation of CA9 as well as neighboring upregulated genes.

In [26]:
sequence = ['blue', 'red']
In [130]:
%%opts Scatter [color_index='label' size_index='size' width=450] (cmap=sequence)
label = {'CA9': ['CA9']}
extents = (0, -12, 20, 12)
ca9_gtex = h.tissue_de('Kidney', gene_labels=label, extents=extents).relabel('GTEx') 
ca9_tcga = h.tissue_de('Kidney', gene_labels=label, tcga_normal=True, extents=extents).relabel('TCGA')
hv.Layout([ca9_gtex, ca9_tcga]).relabel('CA9 Overexpression in TCGA and GTEx')
Out[130]:

\todo[inline]{Benedict: 62/100 of the top most DE genes are shared between Tumor/Normal and Tumor/GTEx}

In [64]:
# NOTE: 62 / 100 of the top most differentially expressed genes are shared
top_gtex = ca9_gtex.data.sort_values('l2fc', ascending=False).index[:100].tolist()
top_tcga = ca9_tcga.data.sort_values('l2fc', ascending=False).index[:100].tolist()

Now we examine CA9 distribution in the context of all tissues as CA9 upregulation has also been reported in colorectal, uterine cervical, hepatocellular, and bladder cancers. [Urquidi et al, 2012., Lee et al, 2007., Cleven et al, 2007., Bandla et al, 2012., Onishi et al, 2011.].

In [44]:
h.gene_distribution(gene='CA9')
Out[44]:

Every normal tissue has low expression except for esophagus, stomach, and testis. Esophagus (tcga-normal) can be ignored due to its low sample size. Stomach normal tissue appears to upregulate CA9 which is seen in literature, "Non-cancerous [gastric] tissues strongly expressed CA9 with membranous localisation" [Chen et al, 2005.]. Now we can compare the average fold change and expression of all tissues to identify tissues with similar expression profiles for CA9.

In [58]:
path = [(4, 4), (9.5, 4), (9.5, 8), (4, 8), (4, 4)]
h.gene_de('CA9', extents=(1.3, -4.3, 14, 12)) * \
hv.Path([path]) * \
h.highlight_points(13.821, 11.643, color='red', size=0.4)
Out[58]:

Kidney is an extreme outlier, but several tissues possess both high levels of expression as well as signifcant L2FC for CA9 (blue boxed area).

  • Bladder
    • Very few normal samples
    • "CA9 are differentially regulated in superficial vs invasive bladder cancer" [Turner et al, 2002.]
    • "Carbonic Anhydrase [...] as Urinary Biomarkers for Bladder Cancer Detection" [Urquidi et al, 2012.]
  • Pancreas
    • Very few TCGA normals
    • "Hypoxia activates the hedgehog signaling pathway in a ligand-independent manner by upregulation of Smo transcription in pancreatic cancer" [Onishi et al, 2011.]
  • Uterus
    • Few TCGA normals
    • "Tumor carbonic anhydrase 9 expression is associated with the presence of lymph node metastases in uterine cervical cancer" [Lee et al, 2007.]
  • Colon
    • "Stromal expression of hypoxia regulated proteins is an adverse prognostic factor in colorectal carcinomas." [Cleven et al, 2007.]
  • Lung
    • "Expression of Hypoxia-inducible Carbonic Anhydrase-9 Relates to Angiogenic Pathways and Independently to Poor Outcome in Non-Small Cell Lung Cancer" [Giatromanolaki et al, 2001.]
  • Esophagus
    • Very few TCGA normals
    • "We also observed higher frequency gains at 9p (13% versus 4%; p = 0.04) containing putative cancer loci such as CA9" - Comparative Genomics of Esophageal Adenocarcinoma and Squamous Cell Carcinoma. [Bandla et al, 2012.]

CA9-Associated Pathway Genes

Pathway information from TBD shows three upstream proteins that transcriptionally promote CA9: SP1, SP3, and the master transcriptional regulator of cellular and developmental response to hypoxia [Wang et al, 1995.]. Tumor hypoxia also instigates upregulation of the VEGF pathway (angiogenesis) and GLUT1/SLC2A1 (metabolism) downstream of CA9 [Chung et al, 2009., Hoskin et al, 2003.].

In [137]:
%%opts HeatMap [width=450]
candidates = ['Pancreas', 'Uterus', 'Colon', 'Lung', 'Esophagus', 'Kidney']
non_candidates = list(set(df.tissue.unique()) - set(candidates))
upstream_genes = ['SP1', 'SP3', 'HIF1A', 'ARNT'] 
downstream_genes = ['SLC2A1', 'VEGFA', 'KDR', 'FLT1']
# Plot up / downstream genes to CA9
up_de = h.gene_de_heatmap(genes=upstream_genes, tissue_subset=candidates).relabel('Upstream')
down_de = h.gene_de_heatmap(genes=downstream_genes, tissue_subset=candidates).relabel('Downstream')

(up_de + down_de)
Out[137]:

Kidney tumor samples significantly upregulate the downstream VEGF/GLUT1 genes. GLUT1 is overexpressed in most tumor samples, but not to the degree that the candidate tissues are.

In [109]:
glut_dist = pd.DataFrame()
glut_dist['l2fc'] = [3.87, 1.48, 3.37, 4.0, 5.9, 3.69] + [2.3, 2.0, 0.2, 2.2, 0.9, 1.4, -1.7, 3.1, 3.7, 2.3]
glut_dist['class'] = ['Candidates' for _ in candidates] + ['Non-Candidates' for _ in non_candidates]
hv.BoxWhisker(glut_dist, kdims=['class'], vdims=['l2fc']).relabel('GLUT1 Upregulation')
Out[109]:
In [136]:
%%opts Overlay [tabs=True] BoxWhisker [width=600]
hv.Overlay([h.gene_distribution('SLC2A1', tissue_subset=[x], types=True).relabel(x) for x in sorted(candidates)])
Out[136]:

Discussion

CA9 is a ubiquitous tumor biomarker for RCC with possibilities for treatment from new immunoconjugatives derived from Girentuximab [, .]. Exploration of CA9 differential expression across cancers in TCGA and GTEx reveal significant upregulation across several cancer subtypes in addition to upregulation for other hypoxic markers. This suggests that some cancers in other tissues, like hepatocellular adenocarcinoma, may benefit from CA9-targeted drugs.

Programmed Cell Death Ligand 1

Cancer cells sometimes accrue mutations that assist them in avoiding detection by the immune system. One of these mechanisms is programmed death-ligand 1 (PD-L1; CD274) which is expressed on many cancer and immune cells and plays an important part in blocking the ‘cancer immunity cycle’ by binding programmed death-1 (PD-1; PDCD1) and B7.1 (CD80), both of which are negative regulators of T-lymphocyte activation [Herbst et al, 2014.]. PD-L1 is widely expressed by antigen-presenting cells (e.g., macrophages, B cells, dendritic cells) as a way of fine-tuning peripheral immune activation and avoiding autoimmunity [Inman et al, 2017.]. There are several FDA-approved antineoplastics that work through interference of PD1/PD-L1 such as Atezolizumab, Avelumab, Durvalumab, and Nivolumab.

Atezolizumab is a humanized monoclonal immunoglobulin G1 antibody to PD-L1 that is used in cancer immunotherapy. Inhibition of PD-L1 overcomes the usual block in immune surveillance of tumor cell neoantigens and can induce remissions in several forms of advanced, metastatic cancer including urothelial (bladder and urethral) carcinoma and NSCLC [Rosenberg et al, 2016., Balar et al, 2017., Fehrenbacher et al, 2016.].

Avelumab binds PD-L1 and blocks the interaction between PD-L1 and its receptors PD-1 and B7.1. This interaction releases the inhibitory effects of PD-L1 on the immune response resulting in the restoration of immune responses, including anti-tumor immune responses and growth. Avelumab has been tested in urothelial and skin carcinomas [Kaufman et al, 2016., Disis et al, 2015., Apolo et al, 2017., Gulley et al, 2015.].

Durvalumab is a programmed death-ligand 1 (PD-L1) blocking antibody indicated for the treatment of patients with locally advanced or metastatic urothelial carcinoma or NSCLC [Massard et al, 2016., Antonia et al, 2017., Planchard et al, 2016.].

Nivolumab is a fully human immunoglobulin G4 (IgG4) monoclonal antibody that selectively inhibits programmed cell death-1 (PD-1) activity by binding to the PD-1 receptor to block the ligands PD-L1 and PD-L2 from binding. The negative PD-1 receptor signaling that regulates T-cell activation and proliferation is therefore disrupted [Robert et al, 2014.]. This releases PD-1 pathway-mediated inhibition of the immune response, including the antitumor immune response [FDA, .]. Nivolumab is used as a first line treatment for inoperable metastatic melanoma in combination with ipilimumab if the cancer does not have a mutation in BRAF, as a second-line treatment following treatment with ipilimumab and if the cancer has a mutation in BRAF, with a BRAF inhibitor [{van Rooij} et al, 2013., Wolchok et al, 2013.]. It is also used as a second-line treatment for squamous non-small cell lung cancer [Brahmer et al, 2015.], non-squamous NSCLC [Borghaei et al, 2015.], and as a second-line treatment for renal cell carcinoma [Motzer et al, 2015.]. Recently Nivolumab has also been used to treat bladder cancer [Powles et al, 2014.]. Side effects of Nivolumab include severe immune-related inflammation of the lungs, colon, liver, kidneys, and thyroid, and there are effects on skin, central nervous system, the heart, and the digestive system [FDA, .].

In [139]:
%%opts Overlay [width=400]
PDL1 = 'CD274'
pos_tissues = ['Skin', 'Kidney', 'Lung', 'Bladder']

kdes = hv.Layout([h.gene_kde(gene=PDL1, tissue_subset=[t]).relabel(t) for t in pos_tissues])
kdes.relabel('PD-L1 Expression').cols(2)
Out[139]:

Skin and kidney samples both have upregulation in the tumor samples compared to their normal counterparts. Bladder does as well if the TCGA-normal samples are discarded, but there are very few samples in either the GTEx or TCGA-normal dataset. Neither squamous or non-squamous NSC lung cancer samples are upregulated compared to their GTEx or TCGA-normal counterpart, but that corrobrates a recent report stating that Nivolumab didn't perform any better than chemotherapy for treating lung cancer [Loftus et al, 2016.].

In [161]:
h.l2fc_by_perc_samples(gene=PDL1, tissue_subset=pos_tissues).relabel('PD-L1 Samples by Fold Change')
Out[161]:

About half of renal cell carcinoma and melanoma samples have greater than log2(2) fold change relative to their GTEx counterpart, whereas lung adenocarcinoma has fewer than 10%.

In [149]:
h.gene_distribution(PDL1).relabel('PD-L1 Expression Across Tissues')
Out[149]:

Aside from known positives, hepatocellular cancer, stomach adenocarcinoma, thyroid carcinoma, and testicular germ cell tumors are all differentially expressed with respect to their GTEx counterpart. Hepatocellular and testicular cancers both have no TCGA normal samples. Stomach and thyroid are both listed as side-effect tissues of Nivolumab [Unresolved citation: fda125554s012lbl.pdf.]. Next we identify tissues with similar differential expression motifs.

In [150]:
de_plot = h.gene_de(PDL1)
d = de_plot.data
# Use kidney for L2FC cutoff
tissues = d[d.L2FC >= float(d[d.Tissue == 'Kidney'].L2FC)]

# Box positive control tissues and those nearby
top = h.highlight_points(xs=tissues.Expression, ys=tissues.L2FC)
pos = [h.highlight_points(xs=d[d.Tissue==x].Expression, 
                          ys=d[d.Tissue==x].L2FC, color='red') for x in pos_tissues]
hv.Overlay([de_plot, top] + pos)
Out[150]:
In [159]:
candidates = ['Testis', 'Stomach', 'Thyroid']
h.l2fc_by_perc_samples(gene=PDL1, tissue_subset=['Testis', 'Stomach', 'Thyroid'])
Out[159]:

Stomach, testes, and thyroid samples share similar differential expression patterns to skin, kidney, and bladder tissues, with about ~50% of tumor samples possessing a greater than log2(2) fold change. This corroborates findings for a study on gastric cancers in 2016, "In this population of patients with recurrent or metastatic PD-L1-positive gastric cancer, [anti-PD1 mAb] had a manageable toxicity profile and promising antitumour activity, warranting further study in phase 2 and 3 trials" [Muro et al, 2016.]. High PD-L1 expression has also been associated with poor prognosis in hepatocellular carcinoma where "Monoclonal antibodies against PD-L1 or PD-1 induced a substantial antitumor effect on murine pancreatic cancer in vivo" [Nomi et al, 2007.] and has been used in conjunction with other drugs that treat hepatocellular carcinoma in cell lines [Feig et al, 2013.]. This dataset also validates studies that show frequent upregulation of PD-L1 in testicular germ cell tumors [Fankhauser et al, 2015., Cierna et al, 2016.].

Discussion and Future Work

\todo[inline]{Emailed Valdo re: pathway files}

\todo[inline]{Emailed Olena re: meeting}

\todo[inline]{Need to email committee}